Effectiveness and Limitations of Statistical Spam Filters

نویسندگان

M. Tariq Banday

Tariq R. Jan

چکیده

Spam is not only clogging the Internet traffic by consuming a hefty amount of network bandwidth but it is also a source for e-mail born viruses, spyware, adware and Trojan Horses. It is also used to carry out denial of service, directory harvesting and phishing attacks that directly cause financial losses. Further, the contents of spam are often offensive and contain adult oriented and fraudulent materials which are objectionable to recipients. Several anti-spam procedures are currently employed to distinguish spam from legitimate e-mails; however spammers and phishers employ dynamic spam structures to obfuscate email content to circumvent these procedures. Apart from other technological procedures various adaptive learning filters have been developed that have an ability to allow an algorithm to constantly learn what sort of e-mail’s or e-mail content a recipient would typically process and what to see in normal course of its business. These filters are based on complex statistical techniques that classify future e-mails based on the word content of accepted e-mails. The statistical techniques employed in these filters separate an incoming e-mail into tokens and assign a probability value to each token. The probability of each token are collectively used to calculate the overall spam probability and accordingly the incoming e-mail is scored as spam, probably spam or legitimate e-mail.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...

متن کامل

Good Word Attacks on Statistical Spam Filters

Unsolicited commercial email is a significant problem for users and providers of email services. While statistical spam filters have proven useful, senders of spam are learning to bypass these filters by systematically modifying their email messages. In a good word attack, one of the most common techniques, a spammer modifies a spam message by inserting or appending words indicative of legitima...

متن کامل

Machine Learning for Naive Bayesian Spam Filter Tokenization

Background Traditional client level spam filters rely on rule based heuristics. While these filters can be effective they have several limitations. The rules must be created by hand. This requires the filter creator to examine a corpus of spam and cull out characteristics. This is a time consuming process and it is easy to miss rules which are quite effective at detecting spam. While the word ”...

متن کامل

Training SpamAssassin with Active Semi-supervised Learning

Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filter...

متن کامل

ar X iv : c s . C R / 0 40 20 46 v 1 1 9 Fe b 20 04 SPAM FILTER ANALYSIS

Unsolicited bulk email (aka. spam) is a major problem on the Internet. To counter spam, several techniques, ranging from spam filters to mail protocol extensions like hashcash, have been proposed. In this paper we investigate the effectiveness of several spam filtering techniques and technologies. Our analysis was performed by simulating email traffic under different conditions. We show that ge...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/0910.2540 شماره

صفحات -

تاریخ انتشار 2009

Effectiveness and Limitations of Statistical Spam Filters

نویسندگان

چکیده

منابع مشابه

A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection

Good Word Attacks on Statistical Spam Filters

Machine Learning for Naive Bayesian Spam Filter Tokenization

Training SpamAssassin with Active Semi-supervised Learning

ar X iv : c s . C R / 0 40 20 46 v 1 1 9 Fe b 20 04 SPAM FILTER ANALYSIS

عنوان ژورنال:

اشتراک گذاری